This notebook presents the plot scripts and statistical analyses presented in the paper “Molecular determinants of dexamethasone vascular transport in COVID-19 therapy” by Shabalin et al. Plots presented herein are interactive versions of those presented in the manuscript.
The data for the study was taken from the supplementary materials of “An interpretable mortality prediction model for COVID-19 patients” Nat. Mach. Intell. 2, 283–288 (2020) by Yan et al. and describes COVID-19 patients admitted to Tongji Hospital, Wuhan, China between January 10 and February 18, 2020. The raw dataset contained information about 375 patients, however only 373 patients that had their albumin levels measured at least once during their hospital stay, and those were of interest to the current study. The table below presents the basic statistics of clinical variables analyzed in this study. These and other statistics, unless stated otherwise, are calculated based on the last available blood sample of given patient.
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing |
|---|---|---|---|---|---|
| 1 | Outcome [factor] |
1. Died 2. Survived |
174 (46.7%) 199 (53.3%) |
0 (0%) |
|
| 2 | Gender [factor] |
1. Male 2. Female |
222 (59.5%) 151 (40.5%) |
0 (0%) |
|
| 3 | Age [numeric] |
Mean (sd) : 58.8 (16.5) min < med < max: 18 < 62 < 95 IQR (CV) : 24 (0.3) |
71 distinct values | 0 (0%) |
|
| 4 | TimeSinceAdmission [numeric] |
Mean (sd) : 7.7 (5.9) min < med < max: 0 < 7.3 < 21.8 IQR (CV) : 10.5 (0.8) |
342 distinct values | 0 (0%) |
|
| 5 | Albumin [numeric] |
Mean (sd) : 32.6 (6.3) min < med < max: 13.6 < 33 < 47.6 IQR (CV) : 9.7 (0.2) |
183 distinct values | 0 (0%) |
|
| 6 | Glucose [numeric] |
Mean (sd) : 8.5 (5.2) min < med < max: 1 < 6.5 < 38.8 IQR (CV) : 4.7 (0.6) |
284 distinct values | 0 (0%) |
The overall mortality rate was 46.65%, whereas the mortality rate among male and female patients was 56.76% and 31.79%, respectively.
The sina plot below presents the distribution of albumin levels among the two outcome groups of patients (Died, Survived), with the mean and standard deviation overlaid. The horizontal dashed lines represent the normal range for albumin (35-55 g/L).
To verify whether the albumin levels in both outcome groups are normally distributed, we first plotted Q-Q plots, as presented below. The shaded area represents the 95% confidence intervals.
Upon a positive verification of the Q-Q plots, the normality of the the albumin distributions was additionally confirmed by two Shapiro-Wilk tests at \(\alpha = 0.05\). The null hypothesis for the test is that the distribution is normal. Therefore, if we cannot reject the null hypotheses, we will assume the the distributions are normal.
| value | |
|---|---|
| statistic.W | 0.995438624983042 |
| p.value | 0.876037297780011 |
| method | Shapiro-Wilk normality test |
| data.name | Outcome == Died |
| value | |
|---|---|
| statistic.W | 0.991932747653017 |
| p.value | 0.338673164623577 |
| method | Shapiro-Wilk normality test |
| data.name | Outcome == Survived |
Since the p-values of the Shapiro-Wilk tests are above 0.05, we cannot reject the null hypothesis that the albumin levels are normally distributed. Therefore, we assume the null hypothesis is true and the albumin levels for both outcome groups are normally distributed. Having verified the normality of the distributions, we can perform a two-tailed Welch’s t-test to check if the differences in albumin levels are statistically significant.
| value | |
|---|---|
| statistic.t | -19.1881248432455 |
| parameter.df | 338.060954028682 |
| p.value | 4.94695446316735e-56 |
| conf.int1 | -9.91937925851427 |
| conf.int2 | -8.07476964693251 |
| estimate.mean of x | 27.7752873563218 |
| estimate.mean of y | 36.7723618090452 |
| null.value.difference in means | 0 |
| stderr | 0.468887633691339 |
| alternative | two.sided |
| method | Welch Two Sample t-test |
| data.name | Outcome == Died vs Outcome == Survived |
With a p-value < 0.001 we can reject the null hypothesis that the means are equal, and state that the differences in mean albumin levels between the patients that died and survived are statistically significant. The mean albumin level for those that sirvived was 36.7723618 g/L, whereas for those that died it was 27.7752874 g/L.
The difference between albumin levels was also evaluated for gender. It can be noticed that the albumin distribution shapes and means are practically identical to those of the overall patient cohort. Nevertheless, the proportions of points within those that died and survived for both genders are different. This relates to the earlier mentioned differences in mortality rates: 56.76% and 31.79% for males and females respectively)
The differences in mean albumin levels within patients of the same gender were statistically significant according to a two-tailed Welch’s t-test at \(\alpha = 0.05\) (p < 0.001).
| value | |
|---|---|
| statistic.W | 0.994896232157248 |
| p.value | 0.933768572264352 |
| method | Shapiro-Wilk normality test |
| data.name | Male: Outcome == Died |
| value | |
|---|---|
| statistic.W | 0.982301118999149 |
| p.value | 0.222347078792031 |
| method | Shapiro-Wilk normality test |
| data.name | Male: Outcome == Survived |
| value | |
|---|---|
| statistic.t | -13.7391493435378 |
| parameter.df | 212.274917998268 |
| p.value | 3.75705427781707e-31 |
| conf.int1 | -9.75071978098348 |
| conf.int2 | -7.30384371108002 |
| estimate.mean of x | 28.0039682539683 |
| estimate.mean of y | 36.53125 |
| null.value.difference in means | 0 |
| stderr | 0.620655728590837 |
| alternative | two.sided |
| method | Welch Two Sample t-test |
| data.name | Male: Outcome == Died vs Outcome == Survived |
| value | |
|---|---|
| statistic.W | 0.970911382714337 |
| p.value | 0.274776893133905 |
| method | Shapiro-Wilk normality test |
| data.name | Female: Outcome == Died |
| value | |
|---|---|
| statistic.W | 0.983523008743715 |
| p.value | 0.230027042682121 |
| method | Shapiro-Wilk normality test |
| data.name | Female: Outcome == Survived |
| value | |
|---|---|
| statistic.t | -11.9560304977694 |
| parameter.df | 71.3045086774262 |
| p.value | 1.05671199130413e-18 |
| conf.int1 | -11.460025381254 |
| conf.int2 | -8.1841493760276 |
| estimate.mean of x | 27.175 |
| estimate.mean of y | 36.9970873786408 |
| null.value.difference in means | 0 |
| stderr | 0.821517424238191 |
| alternative | two.sided |
| method | Welch Two Sample t-test |
| data.name | Female: Outcome == Died vs Outcome == Survived |
Apart from studying the patients’ final albumin samples, we have also analyzed how albumin levels changed over time. The line plot below presents the albumin level of each patient over time. The median linear trends between the first and last albumin levels recorded were:
\[ albumin_{Died} = -0.08x + 27.65\] \[ albumin_{Survived} = 0.02x + 37.2\]
Calculating Pearson’s correlation coefficient, we can state that the correlation between albumin levels and days since hospital admission is statisticaly significant only for those patients that died.
| value | |
|---|---|
| statistic.t | -4.1374243606386 |
| parameter.df | 346 |
| p.value | 4.4145833851774e-05 |
| estimate.cor | -0.217123089822271 |
| null.value.correlation | 0 |
| alternative | two.sided |
| method | Pearson’s product-moment correlation |
| data.name | Outcome == Died |
| conf.int1 | -0.315062363944481 |
| conf.int2 | -0.114608174533471 |
| value | |
|---|---|
| statistic.t | -1.35690184640133 |
| parameter.df | 396 |
| p.value | 0.175585343440854 |
| estimate.cor | -0.0680289181983126 |
| null.value.correlation | 0 |
| alternative | two.sided |
| method | Pearson’s product-moment correlation |
| data.name | Outcome == Survived |
| conf.int1 | -0.165222100448212 |
| conf.int2 | 0.0304728979060984 |
Another variable that was taken into account was the pateint’s glucose level. The scatter plot below presents the relation between albumin levels (y-axis) and glucose levels (x-axis), with color denoting the outcome groups of patients (red: Died, blue: Survived).The horizontal dashed lines represent the normal range for albumin (35-55 g/L), whereas the vertical dashed lines show the glucose fasting normal range (4.0-5.5 mmol/L) and the random plasma test diabetes threshold (11.1 mmol/L).
It can be noticed that both albumin and glucose levels can be associated with COVID-19 outcome. The relation will be analyzed as part of the logistic regression analysis.
Similarly to glucose levels, the scatter plot below presents the relation between albumin levels (y-axis) and age (x-axis), with color denoting the outcome groups of patients (red: Died, blue: Survived).
Once again, it can be noticed that age can be associated with albumin levels and COVID-19 outcome. This relation will also be analyzed as part of the logistic regression analysis.
To verify the statistical significance of associations between different variables analyzed in this study (albumin levels, glucose levels, gender, age) and the outcome (Died, Survived). We will first created unadjusted models to see the relations between each single variable and the outcome. After that we will create and adjusted model where all the variables are taken into account as potential confounders. Finally, we will verify whether apart from confounding there is any effect modification between albumin levels and other variables.
Below are the results of logistic regression for each variable. All the unadjusted models were found to be statistically significant at \(\alpha = 0.05\), with p < 0.001.
| OutcomeNumber | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 *** | 0.00 – 0.00 | <0.001 |
| Albumin | 1.56 *** | 1.44 – 1.71 | <0.001 |
| Observations | 373 | ||
| R2 Tjur | 0.551 | ||
|
|||
| OutcomeNumber | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.76 * | 0.58 – 0.99 | 0.045 |
| Gender [Female] | 2.82 *** | 1.83 – 4.37 | <0.001 |
| Observations | 373 | ||
| R2 Tjur | 0.060 | ||
|
|||
| OutcomeNumber | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 479.80 *** | 133.55 – 2023.14 | <0.001 |
| Age | 0.90 *** | 0.88 – 0.92 | <0.001 |
| Observations | 373 | ||
| R2 Tjur | 0.332 | ||
|
|||
| OutcomeNumber | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 10.30 *** | 5.76 – 19.34 | <0.001 |
| Glucose | 0.76 *** | 0.70 – 0.81 | <0.001 |
| Observations | 373 | ||
| R2 Tjur | 0.227 | ||
|
|||
The adjusted model has shown that albumin levels are statistically significantly associated with COVID-19 outcome (p < 0.001), even when confounding factors are taken into account.
| OutcomeNumber | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 *** | 0.00 – 0.02 | <0.001 |
| Albumin | 1.51 *** | 1.37 – 1.69 | <0.001 |
| Glucose | 0.89 ** | 0.81 – 0.96 | 0.006 |
| Age | 0.92 *** | 0.89 – 0.94 | <0.001 |
| Gender [Female] | 2.19 * | 1.04 – 4.72 | 0.041 |
| Observations | 373 | ||
| R2 Tjur | 0.676 | ||
|
|||
Test for interactions between albumin levels and other variables, did not show any significant effect modification; all p > 0.20.
| OutcomeNumber | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 *** | 0.00 – 0.00 | <0.001 |
| Albumin | 1.48 *** | 1.35 – 1.66 | <0.001 |
| Gender [Female] | 0.03 | 0.00 – 13.43 | 0.285 |
| Albumin * Gender [Female] | 1.15 | 0.95 – 1.42 | 0.173 |
| Observations | 373 | ||
| R2 Tjur | 0.570 | ||
|
|||
| OutcomeNumber | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 71.37 | 0.173 |
| Albumin | 1.68 * | 1.03 – 2.89 | 0.049 |
| Age | 0.95 | 0.74 – 1.24 | 0.722 |
| Albumin * Age | 1.00 | 0.99 – 1.01 | 0.772 |
| Observations | 373 | ||
| R2 Tjur | 0.658 | ||
|
|||
| OutcomeNumber | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 *** | 0.00 – 0.00 | <0.001 |
| Albumin | 1.73 *** | 1.42 – 2.09 | <0.001 |
| Glucose | 1.44 | 0.70 – 2.39 | 0.242 |
| Albumin * Glucose | 0.98 | 0.97 – 1.01 | 0.096 |
| Observations | 373 | ||
| R2 Tjur | 0.590 | ||
|
|||
Finally, for the only categorical confounding variable we have plotted (gender) a logistic regression plot with separate lines for males and females. It can be noted that gender makes a different only for high albumin levels. In other words, low albumin levels are an equally strong predictor of death from COVID-19.
If you find this analysis useful, please cite: Ivan G. Shabalin1, Mateusz P. Czub, Karolina A. Majorek, Dariusz Brzezinski, Marek Grabowski, David R. Cooper, Mateusz Panasiuk, Maksymilian Chruszcz, Wladek Minor, “Molecular determinants of dexamethasone vascular transport in COVID-19 therapy”, in review.